Reproducible data analysis:
Document and share exactly how you analyzed your data
Do more with your analysis, more efficiently:
More control and flexibility
Use community-created analysis tools
Leverage cancer data resources (often large datasets)
Create awesome visualizations
Get everyone past ‘valley of despair’ in R learning curve
Convince you that R is an accessible and useful tool for you in your research
Prepare you to tackle BootCamp projects next week
Get you excited to keep developing these coding skills!
Upsides
Free
Great for data analysis and visualization
LOTS of bioinformatics/stats tools available
Downsides?
It’s a hodge-podge
Not the best for engineering software
[LIVE DEMO]
What this course is
Coding basics
Heavy emphasis on practical skills (data wrangling, visualization)
Flagging areas with technical depth but giving the ‘need-to-know’
What this course is not
Intro to computer science
Intro to stats
Presentations: lecture style: present key concepts
Practice workbooks: Hands-on practice with small groups
Weekend homework assignments: 1 each weekend
Resources
Instructors:
TAs: (TBD)
Course website: Bootcamp_R_tutorials
Slack #c3-bootcamp_r
Chester Ismay’s DataCamp slides
HBC Intro to R course
Sam Meier’s 2018 R lectures
Hadley Wickham’s R for Data Science
Your feedback is very much appreciated (Slack, email, etc)
We’ll do our best to adapt as we go.
R Markdown/Notebooks
Sort of like a lab notebook for analysis
Easily share results and methods in different formats
Encourages good code and analysis practices
Where the action happens!
Provide inputs to R
See outputs of commands you give it (each on separate line)
Organize your work as ‘Projects’ in Rstudio
Each project has a separate folder, with data, code, results.
Create a ‘Project’ in Rstudio
List of key math operations
*: multiplication/: division+: addition-: subtraction^: ‘raise to the power’== Check equals!= Check not equals> Greater than, < less than>= Greater than or equal to, etc.[1] FALSE
Variables:
Functions:
<-[1] 12
All object-creation statements have the form
object_name <- value
You can use =, but <- makes for
better R code
[1] TRUE
Factors (categorical variables)
e.g. (‘bad’, ‘OK’, ‘good’, ‘great’)
We will mostly try to avoid these, but be aware of them.
_’, and ‘.’.Not allowed: 4th, my var,
weird?, etc. etc.
Object naming is important for writing good, readable, code
Make variable names descriptive
avgClicks
calculate_avg_clicks
NOT: var1 or a
Code readability is huge so others can understand what you’ve done
In RMarkdown docs, write descriptive text before each code chunk
Also good to add comments to key lines of code within chunks
See info on current variables
Clearing variables
View data tables, etc.
Vectors
Lists
Matrix
Dataframes
Ordered collection of values. Like a sequence of ‘buckets’
Can hold numeric data
c()num_vec <- c(1, 2, 3, 4)
log_vec <- c(TRUE, TRUE, FALSE, F)
str_vec <- c('this', 'is', 'a', 'vector', 'of', 'strings')
print(num_vec)[1] 1 2 3 4
[1] 1 2 3 4
Quick notes on missing values in R (will be important)
NA (‘not available’) is a special value for missing
data that can be included in any type of vector
[1] 1 2 NA
[1] "a" "b" NA
c() can also be used to add new elements to a vector
[1] "TP53" "PLEC" "DSPP" "PIK3CA" "BRAF"
Combining two vectors
string_vec <- c("TP53", "PLEC", "DSPP", "PIK3CA")
string_vec2 <- c("BRAF", "EGFR", "DUSP4")
c(string_vec, string_vec2)[1] "TP53" "PLEC" "DSPP" "PIK3CA" "BRAF" "EGFR" "DUSP4"
Lists are basically like relaxed vectors, where elements don’t have to be the same type
[[1]]
[1] "a"
[[2]]
[1] 1
[[3]]
[1] TRUE
You can combine lists with the c() function as with
vectors
[[1]]
[1] "a"
[[2]]
[1] 1
[[3]]
[1] TRUE
[[4]]
[1] "c"
Most common way of interacting with data
Each column is a vector, and they can hold different kinds of data
Like an Excel table.
Use RMarkdown documents like an experiment notebook
Create variables with <-, naming them is
important
Variables can be numbers, text (strings) or TRUE/FALSE (boolean)
Data organized as vectors, lists, matrices, and dataframes
Create/add to vectors (or lists) with c()
Make lists with list()